library(topGO)
library(knitr)   # to use kable()
library(limma)   # to use vennDiagram()
library(ggplot2)
library(variancePartition)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
                   isoforms = '../2019-07-26/transcripts/annotation.txt')
DGEDIR <- '../2020-01-08/'
ENRICHMENT <- paste('../2020-01-14/Enrichment', TAG, VAR, 'RData', sep='.')

Introduction

This is the enrichment analysis for genes regulated by regime. I am using the topGO package (Alexa and Rahnenfuhrer 2019), which is able to apply different algorithms. The starting point is an ordering of genes usually according to their association with a phenotype or any other relevant quantity, such as a measure of differential expression between two conditions. In this case, we order genes by their differential expression between regime levels. Following one of the suggestions in the manual, I originally ordered genes by either \(p\) value of the differential expression analysis, or by the (complement of the) amount of its expression variance explained by regime (see 2020-01-14). Those were quite equivalent orderings. However, they do not distinguish the sign of the fold change: both up- and down-regulated genes contribute to make a gene ontology term significant. It is one of the approaches recommended, and it makes sense, to the extent that among all the genes annotated with a function, some may display contrasting expression patterns, for example if both positive and negative regulators of that function are annotated with the same label. I thus, expect the use of \(p\) values in enrichment analysis to facilitate the detection of high level categories.

In contrast, genes sharing lower level categories (very specific GO terms) are more likely to be expressed in a similar way. Then, an ordering of genes that reflect the sign of the difference in expression levels between the two conditions compared would be more informative. In principle, enrichment methods should be able to detect an enrichment in either end of the list, rather than only in the top. However, I noticed that the Kolmogorov-Smirnov test implemented in topGO only looks at the top of the list. Even if the score can be negative, it treats it as if it always was a p value, and orders genes from lowest to highest. Even if the function that selects for significant genes uses the absolute value of the score, the Kolmogorov-Smirnov test does not use any threshold, and ignores that detail.

In any case, I don’t want to use a different statistic based on a pre-selected subset of significant genes. In addition to that being an arbitrary decision, the ordering of genes would become folded again. I just need to run the enrichment analysis with the KS statistic twice: once with the t statistic, and once with its reverse, \(-t\). The first test will find GO terms overrepresented among the genes that are overexpressed in the regular, relative to the random environment. The second test will find GO terms overrepresented among genes overexpressed in the random environment.

Reading the data

Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.

load(paste0(DGEDIR, TAG, '.RData'))
tagTStat    <- topTable(fitmm, coef = VAR, num = length(fitmm$F), sort.by='t', resort.by='t')[,'t',drop=FALSE]
Annotation  <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))

To initialize the topGOdata object, I need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, the \(t\) statistics from the differential expression analysis. The structure() function adds attributes to an object.

TStats <- structure(tagTStat[,1], names = row.names(tagTStat))
TMinus <- -TStats
rm(tagTStat)

There are 18598 genes scored with a \(t\) statistic. The Annotation data frame has more than one GO term for every tag, separated by |. I need a named list.

head(Annotation)
##       tagname                                                           goterms
## 1 XLOC_000002                                  GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009                                  GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010                       GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021                       GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036                                  GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(Annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- Annotation$tagname
rm(Annotation)

There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.

for (tag in names(TStats)) {
   if (! tag %in% names(allgenes2GO)) {
      allgenes2GO <- append(allgenes2GO,
         structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
   }
}

Building the topGO objects

Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.

I generate three datasets, to analyse each of the three ontologies. These are the three topGO objects for the regular regime.

# with 'ks' statistics, the selection function only affects the display of
# presumed numbers of 'significant' genes, I think.
selection <- function(allScores) {return(allScores < -5.5)}
GOdata.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = TStats,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = TStats,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = TStats,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
   Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

And the following three, for the random regime.

GOminus.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = TMinus,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOminus.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = TMinus,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOminus.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = TMinus,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

Running the tests

There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.

I start testing for enrichment of GO terms among genes overexpressed in the regular environment.

BP.reg.elim     <- runTest(GOdata.BP, algorithm = "elim",     statistic = "ks")
BP.reg.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.reg.lea      <- runTest(GOdata.BP, algorithm = "lea",      statistic = "ks")
MF.reg.elim     <- runTest(GOdata.MF, algorithm = "elim",     statistic = "ks")
MF.reg.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.reg.lea      <- runTest(GOdata.MF, algorithm = "lea",      statistic = "ks")
CC.reg.elim     <- runTest(GOdata.CC, algorithm = "elim",     statistic = "ks")
CC.reg.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.reg.lea      <- runTest(GOdata.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BP.reg.elim, BP.reg.weight01, BP.reg.lea,
                             MF.reg.elim, MF.reg.weight01, MF.reg.lea,
                             CC.reg.elim, CC.reg.weight01, CC.reg.lea), function(X) length(score(X))),
   Significant = sapply(list(BP.reg.elim, BP.reg.weight01, BP.reg.lea,
                             MF.reg.elim, MF.reg.weight01, MF.reg.lea,
                             CC.reg.elim, CC.reg.weight01, CC.reg.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 31
BP weight01 1154 32
BP lea 1154 46
MF elim 576 47
MF weight01 576 42
MF lea 576 68
CC elim 291 17
CC weight01 291 17
CC lea 291 30
rm(ResultsSummary)

And now in the random environment:

BP.ran.elim     <- runTest(GOminus.BP, algorithm = "elim",     statistic = "ks")
BP.ran.weight01 <- runTest(GOminus.BP, algorithm = "weight01", statistic = "ks")
BP.ran.lea      <- runTest(GOminus.BP, algorithm = "lea",      statistic = "ks")
MF.ran.elim     <- runTest(GOminus.MF, algorithm = "elim",     statistic = "ks")
MF.ran.weight01 <- runTest(GOminus.MF, algorithm = "weight01", statistic = "ks")
MF.ran.lea      <- runTest(GOminus.MF, algorithm = "lea",      statistic = "ks")
CC.ran.elim     <- runTest(GOminus.CC, algorithm = "elim",     statistic = "ks")
CC.ran.weight01 <- runTest(GOminus.CC, algorithm = "weight01", statistic = "ks")
CC.ran.lea      <- runTest(GOminus.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BP.ran.elim, BP.ran.weight01, BP.ran.lea,
                             MF.ran.elim, MF.ran.weight01, MF.ran.lea,
                             CC.ran.elim, CC.ran.weight01, CC.ran.lea), function(X) length(score(X))),
   Significant = sapply(list(BP.ran.elim, BP.ran.weight01, BP.ran.lea,
                             MF.ran.elim, MF.ran.weight01, MF.ran.lea,
                             CC.ran.elim, CC.ran.weight01, CC.ran.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 76
BP weight01 1154 53
BP lea 1154 151
MF elim 576 32
MF weight01 576 39
MF lea 576 51
CC elim 291 43
CC weight01 291 27
CC lea 291 69
rm(ResultsSummary)

Functional enrichment in predictable environment

Biological process

orderedTerms <- names(sort(score(BP.reg.weight01)))
significant.weight01 <- score(BP.reg.weight01)[orderedTerms] <= 0.005
significant.lea      <- score(BP.reg.lea)[orderedTerms] <= 0.005
significant.elim     <- score(BP.reg.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.reg.sigTerms <- sigTerms

BP.all <- GenTable(GOdata.BP, elim=BP.reg.elim, weight01=BP.reg.weight01, lea=BP.reg.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BP.all[BP.all$GO.ID %in% sigTerms,],
   caption = "Most overrepresented Biological Process terms in predictable environments")
Most overrepresented Biological Process terms in predictable environments
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0007186 G protein-coupled receptor signaling pat… 206 1 0.76 1 1.3e-26 4.4e-27 1.3e-26
2 GO:0006508 proteolysis 442 4 1.63 3 8.1e-10 8.4e-19 8.1e-10
3 GO:0006814 sodium ion transport 88 0 0.33 2 5.8e-10 5.8e-10 5.8e-10
4 GO:0006030 chitin metabolic process 74 3 0.27 4 1.2e-09 3.0e-09 1.2e-09
5 GO:0006813 potassium ion transport 56 0 0.21 5 1.4e-07 1.4e-07 8.8e-10
6 GO:0034220 ion transmembrane transport 147 0 0.54 10 2.6e-05 1.9e-07 2.2e-08
7 GO:0055085 transmembrane transport 500 2 1.85 7 1.2e-06 4.2e-07 6.9e-13
8 GO:0006486 protein glycosylation 75 0 0.28 11 0.00011 6.6e-07 0.00011
9 GO:0005975 carbohydrate metabolic process 202 6 0.75 18 0.00121 3.3e-06 0.00024
10 GO:0060271 cilium assembly 46 0 0.17 6 9.5e-07 7.6e-06 1.4e-08
11 GO:0015074 DNA integration 34 0 0.13 8 1.5e-05 1.5e-05 1.5e-05
12 GO:0006811 ion transport 436 0 1.61 12 0.00014 2.6e-05 7.0e-19
14 GO:0007156 homophilic cell adhesion via plasma memb… 21 0 0.08 13 0.00015 0.00015 0.00015
15 GO:0007160 cell-matrix adhesion 16 0 0.06 14 0.00020 0.00020 0.00020
16 GO:0003341 cilium movement 10 0 0.04 15 0.00026 0.00026 0.00026
18 GO:0070588 calcium ion transmembrane transport 30 0 0.11 16 0.00101 0.00101 0.00101
19 GO:0005992 trehalose biosynthetic process 6 2 0.02 17 0.00110 0.00110 0.00110
20 GO:0007154 cell communication 565 1 2.09 9 1.9e-05 0.00132 0.00016
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
                 Definition = Definition(GOTERM[sigTerms]),
                 PValue=score(BP.reg.weight01)[sigTerms]),
      caption = 'BP terms overrepresented in predictable environments')
BP terms overrepresented in predictable environments
Term Definition PValue
GO:0007186 G protein-coupled receptor signaling pathway A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). 0.0000000
GO:0006508 proteolysis The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. 0.0000000
GO:0006814 sodium ion transport The directed movement of sodium ions (Na+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000000
GO:0006030 chitin metabolic process The chemical reactions and pathways involving chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0000000
GO:0006813 potassium ion transport The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000001
GO:0034220 ion transmembrane transport A process in which an ion is transported across a membrane. 0.0000002
GO:0055085 transmembrane transport The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. 0.0000004
GO:0006486 protein glycosylation A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. 0.0000007
GO:0005975 carbohydrate metabolic process The chemical reactions and pathways involving carbohydrates, any of a group of organic compounds based of the general formula Cx(H2O)y. Includes the formation of carbohydrate derivatives by the addition of a carbohydrate residue to another molecule. 0.0000033
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0000076
GO:0015074 DNA integration The process in which a segment of DNA is incorporated into another, usually larger, DNA molecule such as a chromosome. 0.0000147
GO:0006811 ion transport The directed movement of charged atoms or small charged molecules into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000263
GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules The attachment of a plasma membrane adhesion molecule in one cell to an identical molecule in an adjacent cell. 0.0001501
GO:0007160 cell-matrix adhesion The binding of a cell to the extracellular matrix via adhesion molecules. 0.0001997
GO:0003341 cilium movement The directed, self-propelled movement of a cilium. 0.0002568
GO:0070588 calcium ion transmembrane transport A process in which a calcium ion is transported from one side of a membrane to the other by means of some agent such as a transporter or pore. 0.0010143
GO:0005992 trehalose biosynthetic process The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. 0.0010957
GO:0007154 cell communication Any process that mediates interactions between a cell and its surroundings. Encompasses interactions such as signaling or attachment between one cell and another cell, between a cell and an extracellular matrix, or between a cell and any other aspect of its environment. 0.0013231
GO:0007018 microtubule-based movement A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. 0.0036411

I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.

showSigOfNodes(GOdata.BP, score(BP.reg.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 130 
## Number of Edges = 236 
## 
## $complete.dag
## [1] "A graph with 130 nodes."

This is just a example of the most significant GO term:

showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MF.reg.weight01)))
significant.weight01 <- score(MF.reg.weight01)[orderedTerms] <= 0.005
significant.lea      <- score(MF.reg.lea)[orderedTerms] <= 0.005
significant.elim     <- score(MF.reg.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.reg.sigTerms <- sigTerms

MF.all <- GenTable(GOdata.MF, elim=MF.reg.elim, weight01=MF.reg.weight01, lea=MF.reg.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   MF.all[MF.all$GO.ID %in% sigTerms,],
   caption = "Most overrepresented Molecular Function terms in predictable environments.")
Most overrepresented Molecular Function terms in predictable environments.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0005509 calcium ion binding 361 0 1.43 1 < 1e-30 < 1e-30 < 1e-30
2 GO:0004930 G protein-coupled receptor activity 180 1 0.71 2 5.7e-23 9.3e-22 1.3e-26
3 GO:0043565 sequence-specific DNA binding 146 0 0.58 4 2.0e-09 1.4e-11 2.0e-09
4 GO:0008061 chitin binding 77 2 0.30 3 1.1e-09 1.1e-09 1.1e-09
5 GO:0005272 sodium channel activity 73 0 0.29 5 7.0e-09 7.0e-09 7.0e-09
6 GO:0004252 serine-type endopeptidase activity 88 2 0.35 6 2.3e-08 2.3e-08 2.3e-08
7 GO:0003774 motor activity 131 0 0.52 7 1.3e-07 1.3e-07 2.9e-07
8 GO:0004181 metallocarboxypeptidase activity 19 0 0.08 8 2.2e-07 2.2e-07 2.2e-07
9 GO:0005515 protein binding 2101 10 8.32 17 0.00070 1.4e-06 0.00128
10 GO:0005249 voltage-gated potassium channel activity 27 0 0.11 9 1.7e-06 1.7e-06 1.7e-06
11 GO:0004888 transmembrane signaling receptor activit… 251 1 0.99 11 6.3e-06 6.1e-06 3.6e-30
12 GO:0008417 fucosyltransferase activity 33 0 0.13 12 8.8e-06 8.8e-06 8.8e-06
13 GO:0005230 extracellular ligand-gated ion channel a… 54 0 0.21 10 1.7e-06 1.0e-05 9.2e-08
14 GO:0005200 structural constituent of cytoskeleton 17 2 0.07 13 0.00023 0.00023 0.00023
15 GO:0004222 metalloendopeptidase activity 73 1 0.29 14 0.00031 0.00031 0.00031
16 GO:0005328 neurotransmitter:sodium symporter activi… 18 0 0.07 15 0.00055 0.00055 0.00055
17 GO:0051015 actin filament binding 30 0 0.12 16 0.00070 0.00070 0.00070
18 GO:0003777 microtubule motor activity 98 0 0.39 20 0.00085 0.00085 0.00085
19 GO:0030248 cellulose binding 7 1 0.03 21 0.00090 0.00090 0.00090
20 GO:0020037 heme binding 95 1 0.38 23 0.00112 0.00112 0.00112
21 GO:0016715 oxidoreductase activity, acting on paire… 14 1 0.06 24 0.00123 0.00123 0.00123
22 GO:0004096 catalase activity 15 0 0.06 25 0.00156 0.00156 0.00156
23 GO:0031409 pigment binding 10 1 0.04 28 0.00172 0.00172 0.00172
25 GO:0004983 neuropeptide Y receptor activity 9 0 0.04 29 0.00188 0.00188 0.00188
26 GO:0015293 symporter activity 32 0 0.13 34 0.00447 0.00238 1.6e-05
27 GO:0008378 galactosyltransferase activity 23 0 0.09 30 0.00256 0.00256 0.00256
28 GO:0004601 peroxidase activity 34 1 0.13 31 0.00300 0.00300 2.8e-05
29 GO:0004190 aspartic-type endopeptidase activity 28 0 0.11 32 0.00306 0.00306 0.00306
31 GO:0008528 G protein-coupled peptide receptor activ… 15 0 0.06 33 0.00428 0.00428 1.6e-05
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MF.reg.weight01)[sigTerms]),
      caption = 'MF terms overrepresented in predictable environments')
MF terms overrepresented in predictable environments
Term Definition PValue
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0000000
GO:0004930 G protein-coupled receptor activity Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0000000
GO:0043565 sequence-specific DNA binding Interacting selectively and non-covalently with DNA of a specific nucleotide composition, e.g. GC-rich DNA binding, or with a specific sequence motif or type of DNA e.g. promotor binding or rDNA binding. 0.0000000
GO:0008061 chitin binding Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0000000
GO:0005272 sodium channel activity Enables the facilitated diffusion of a sodium ion (by an energy-independent process) involving passage through a transmembrane aqueous pore or channel without evidence for a carrier-mediated mechanism. 0.0000000
GO:0004252 serine-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). 0.0000000
GO:0003774 motor activity Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. 0.0000001
GO:0004181 metallocarboxypeptidase activity Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0000002
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000014
GO:0005249 voltage-gated potassium channel activity Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. 0.0000017
GO:0004888 transmembrane signaling receptor activity Combining with an extracellular or intracellular signal and transmitting the signal from one side of the membrane to the other to initiate a change in cell activity or state as part of signal transduction. 0.0000061
GO:0008417 fucosyltransferase activity Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. 0.0000088
GO:0005230 extracellular ligand-gated ion channel activity Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. 0.0000100
GO:0005200 structural constituent of cytoskeleton The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. 0.0002305
GO:0004222 metalloendopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0003142
GO:0005328 neurotransmitter:sodium symporter activity Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: neurotransmitter(out) + Na+(out) = neurotransmitter(in) + Na+(in). 0.0005460
GO:0051015 actin filament binding Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. 0.0006982
GO:0003777 microtubule motor activity Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). 0.0008454
GO:0030248 cellulose binding Interacting selectively and non-covalently with cellulose. 0.0009042
GO:0020037 heme binding Interacting selectively and non-covalently with heme, any compound of iron complexed in a porphyrin (tetrapyrrole) ring. 0.0011192
GO:0016715 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. 0.0012308
GO:0004096 catalase activity Catalysis of the reaction: 2 hydrogen peroxide = O2 + 2 H2O. 0.0015600
GO:0031409 pigment binding Interacting selectively and non-covalently with a pigment, any general or particular coloring matter in living organisms, e.g. melanin. 0.0017172
GO:0004983 neuropeptide Y receptor activity Combining with neuropeptide Y to initiate a change in cell activity. 0.0018782
GO:0015293 symporter activity Enables the active transport of a solute across a membrane by a mechanism whereby two or more species are transported together in the same direction in a tightly coupled process not directly linked to a form of energy other than chemiosmotic energy. 0.0023817
GO:0008378 galactosyltransferase activity Catalysis of the transfer of a galactosyl group to an acceptor molecule, typically another carbohydrate or a lipid. 0.0025622
GO:0004601 peroxidase activity Catalysis of the reaction: donor + hydrogen peroxide = oxidized donor + 2 H2O. 0.0030028
GO:0004190 aspartic-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which a water molecule bound by the side chains of aspartic residues at the active center acts as a nucleophile. 0.0030648
GO:0008528 G protein-coupled peptide receptor activity Combining with a peptide and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0042829
showSigOfNodes(GOdata.MF, score(MF.reg.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 116 
## Number of Edges = 151 
## 
## $complete.dag
## [1] "A graph with 116 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CC.reg.weight01)))
significant.weight01 <- score(CC.reg.weight01)[orderedTerms] <= 0.005
significant.lea      <- score(CC.reg.lea)[orderedTerms] <= 0.005
significant.elim     <- score(CC.reg.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.reg.sigTerms <- sigTerms

CC.all <- GenTable(GOdata.CC, elim=CC.reg.elim, weight01=CC.reg.weight01, lea=CC.reg.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   CC.all[CC.all$GO.ID %in% sigTerms,],
   caption = "Most overrepresented Cellular Component terms in predictable environments")
Most overrepresented Cellular Component terms in predictable environments
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0016021 integral component of membrane 885 1 2.60 1 < 1e-30 < 1e-30 < 1e-30
GO:0016020 membrane 1565 3 4.60 3 6.9e-12 1.5e-19 < 1e-30
GO:0005576 extracellular region 140 4 0.41 2 4.1e-15 7.5e-14 1.2e-17
GO:0016459 myosin complex 33 0 0.10 4 5.2e-08 5.2e-08 5.2e-08
GO:0008076 voltage-gated potassium channel complex 13 0 0.04 5 3.2e-06 3.2e-06 3.2e-06
GO:0005886 plasma membrane 170 0 0.50 6 0.00019 0.00016 2.1e-12
GO:0098797 plasma membrane protein complex 34 0 0.10 8 0.00033 0.00049 3.7e-09
GO:0031012 extracellular matrix 9 0 0.03 9 0.00081 0.00081 0.00081
GO:0034704 calcium channel complex 5 0 0.01 10 0.00185 0.00185 0.00185
GO:0005874 microtubule 29 2 0.09 11 0.00221 0.00221 0.00221
GO:0005929 cilium 27 0 0.08 12 0.00279 0.00279 9.8e-08
GO:0042555 MCM complex 10 3 0.03 14 0.00302 0.00302 0.00302
GO:0005858 axonemal dynein complex 5 0 0.01 15 0.00371 0.00371 0.00371
GO:0045211 postsynaptic membrane 16 0 0.05 16 0.00428 0.00428 0.00428
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CC.reg.weight01)[sigTerms]),
      caption = 'CC terms overrepresented in predictable environments')
CC terms overrepresented in predictable environments
Term Definition PValue
GO:0016021 integral component of membrane The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000000
GO:0016020 membrane A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. 0.0000000
GO:0005576 extracellular region The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. 0.0000000
GO:0016459 myosin complex A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. 0.0000001
GO:0008076 voltage-gated potassium channel complex A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. 0.0000032
GO:0005886 plasma membrane The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. 0.0001593
GO:0098797 plasma membrane protein complex Any protein complex that is part of the plasma membrane. 0.0004918
GO:0031012 extracellular matrix A structure lying external to one or more cells, which provides structural support, biochemical or biomechanical cues for cells or tissues. 0.0008138
GO:0034704 calcium channel complex An ion channel complex through which calcium ions pass. 0.0018495
GO:0005874 microtubule Any of the long, generally straight, hollow tubes of internal diameter 12-15 nm and external diameter 24 nm found in a wide variety of eukaryotic cells; each consists (usually) of 13 protofilaments of polymeric tubulin, staggered in such a manner that the tubulin monomers are arranged in a helical pattern on the microtubular surface, and with the alpha/beta axes of the tubulin subunits parallel to the long axis of the tubule; exist in equilibrium with pool of tubulin monomers and can be rapidly assembled or disassembled in response to physiological stimuli; concerned with force generation, e.g. in the spindle. 0.0022146
GO:0005929 cilium A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. 0.0027889
GO:0042555 MCM complex A hexameric protein complex required for the initiation and regulation of DNA replication. 0.0030211
GO:0005858 axonemal dynein complex A dynein complex found in eukaryotic cilia and flagella; the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. 0.0037087
GO:0045211 postsynaptic membrane A specialized area of membrane facing the presynaptic membrane on the tip of the nerve ending and separated from it by a minute cleft (the synaptic cleft). Neurotransmitters cross the synaptic cleft and transmit the signal to the postsynaptic membrane. 0.0042845
showSigOfNodes(GOdata.CC, score(CC.reg.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 68 
## Number of Edges = 117 
## 
## $complete.dag
## [1] "A graph with 68 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)

Functional enrichment in the unpredictable environment

Biological process

orderedTerms <- names(sort(score(BP.ran.weight01)))
significant.weight01 <- score(BP.ran.weight01)[orderedTerms] <= 0.005
significant.lea      <- score(BP.ran.lea)[orderedTerms] <= 0.005
significant.elim     <- score(BP.ran.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.ran.sigTerms <- sigTerms

BP.all <- GenTable(GOminus.BP, elim=BP.ran.elim, weight01=BP.ran.weight01, lea=BP.ran.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BP.all[BP.all$GO.ID %in% sigTerms,],
   caption = "Most overrepresented Biological Process terms in random environments.")
Most overrepresented Biological Process terms in random environments.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0006412 translation 195 0 0.98 1 1.5e-15 3.8e-21 4.5e-18
2 GO:0006511 ubiquitin-dependent protein catabolic pr… 102 4 0.51 2 4.5e-10 8.6e-14 4.5e-10
3 GO:0016579 protein deubiquitination 40 3 0.20 4 2.1e-09 2.1e-09 2.1e-09
4 GO:0000398 mRNA splicing, via spliceosome 46 0 0.23 5 4.1e-07 7.4e-08 1.8e-10
5 GO:0006397 mRNA processing 87 1 0.44 3 1.3e-09 1.4e-07 1.3e-17
6 GO:0006289 nucleotide-excision repair 13 0 0.07 6 4.2e-07 4.2e-07 4.2e-07
7 GO:0006351 transcription, DNA-templated 442 1 2.22 26 0.00120 5.0e-07 1.9e-09
8 GO:0016567 protein ubiquitination 46 2 0.23 8 4.7e-06 4.7e-06 4.7e-06
9 GO:0006396 RNA processing 225 2 1.13 7 8.6e-07 7.8e-06 1.3e-29
10 GO:0006886 intracellular protein transport 141 2 0.71 9 9.6e-06 8.3e-06 2.6e-09
11 GO:0006914 autophagy 21 0 0.11 11 1.5e-05 1.5e-05 2.9e-07
12 GO:0006367 transcription initiation from RNA polyme… 13 0 0.07 12 1.8e-05 1.8e-05 1.8e-05
13 GO:0006352 DNA-templated transcription, initiation 42 0 0.21 14 4.0e-05 4.0e-05 2.9e-07
15 GO:0030163 protein catabolic process 123 4 0.62 15 0.00011 5.9e-05 4.7e-13
16 GO:0006383 transcription by RNA polymerase III 8 0 0.04 17 0.00015 0.00015 0.00015
17 GO:0006413 translational initiation 18 0 0.09 20 0.00026 0.00026 0.00026
18 GO:0043628 ncRNA 3’-end processing 6 0 0.03 21 0.00031 0.00031 0.00031
19 GO:0000184 nuclear-transcribed mRNA catabolic proce… 6 0 0.03 22 0.00046 0.00046 0.00046
20 GO:0000245 spliceosomal complex assembly 7 0 0.04 23 0.00053 0.00053 0.00053
21 GO:0016570 histone modification 30 0 0.15 16 0.00012 0.00076 6.9e-07
22 GO:0006281 DNA repair 125 0 0.63 10 1.0e-05 0.00094 2.9e-07
23 GO:0006368 transcription elongation from RNA polyme… 8 0 0.04 28 0.00141 0.00141 0.00141
24 GO:0002098 tRNA wobble uridine modification 10 1 0.05 29 0.00152 0.00152 0.00152
25 GO:0006744 ubiquinone biosynthetic process 5 0 0.03 31 0.00155 0.00155 0.00155
26 GO:0016573 histone acetylation 12 0 0.06 32 0.00157 0.00157 0.00157
27 GO:0016236 macroautophagy 6 0 0.03 33 0.00170 0.00170 0.00170
28 GO:0043631 RNA polyadenylation 10 0 0.05 35 0.00182 0.00182 0.00182
29 GO:0035304 regulation of protein dephosphorylation 6 0 0.03 37 0.00213 0.00213 0.00213
30 GO:0045944 positive regulation of transcription by … 6 0 0.03 39 0.00223 0.00223 0.00223
32 GO:0008033 tRNA processing 50 1 0.25 46 0.00317 0.00317 3.9e-07
33 GO:0001510 RNA methylation 17 0 0.09 19 0.00016 0.00350 0.00016
34 GO:0071985 multivesicular body sorting pathway 6 0 0.03 49 0.00414 0.00414 0.00414
35 GO:0006888 ER to Golgi vesicle-mediated transport 16 0 0.08 52 0.00449 0.00449 0.00449
37 GO:0043094 cellular metabolic compound salvage 7 1 0.04 53 0.00471 0.00471 0.00471
38 GO:0048015 phosphatidylinositol-mediated signaling 9 0 0.05 56 0.00479 0.00479 0.00479
39 GO:0009894 regulation of catabolic process 5 0 0.03 57 0.00484 0.00484 0.00484
40 GO:0016226 iron-sulfur cluster assembly 8 0 0.04 58 0.00489 0.00489 0.00489
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
                 Definition = Definition(GOTERM[sigTerms]),
                 PValue=score(BP.ran.weight01)[sigTerms]),
      caption = 'BP terms overrepresented in random environments')
BP terms overrepresented in random environments
Term Definition PValue
GO:0006412 translation The cellular metabolic process in which a protein is formed, using the sequence of a mature mRNA or circRNA molecule to specify the sequence of amino acids in a polypeptide chain. Translation is mediated by the ribosome, and begins with the formation of a ternary complex between aminoacylated initiator methionine tRNA, GTP, and initiation factor 2, which subsequently associates with the small subunit of the ribosome and an mRNA or circRNA. Translation ends with the release of a polypeptide chain from the ribosome. 0.0000000
GO:0006511 ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of a ubiquitin group, or multiple ubiquitin groups, to the protein. 0.0000000
GO:0016579 protein deubiquitination The removal of one or more ubiquitin groups from a protein. 0.0000000
GO:0000398 mRNA splicing, via spliceosome The joining together of exons from one or more primary transcripts of messenger RNA (mRNA) and the excision of intron sequences, via a spliceosomal mechanism, so that mRNA consisting only of the joined exons is produced. 0.0000001
GO:0006397 mRNA processing Any process involved in the conversion of a primary mRNA transcript into one or more mature mRNA(s) prior to translation into polypeptide. 0.0000001
GO:0006289 nucleotide-excision repair A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). 0.0000004
GO:0006351 transcription, DNA-templated The cellular synthesis of RNA on a template of DNA. 0.0000005
GO:0016567 protein ubiquitination The process in which one or more ubiquitin groups are added to a protein. 0.0000047
GO:0006396 RNA processing Any process involved in the conversion of one or more primary RNA transcripts into one or more mature RNA molecules. 0.0000078
GO:0006886 intracellular protein transport The directed movement of proteins in a cell, including the movement of proteins between specific compartments or structures within a cell, such as organelles of a eukaryotic cell. 0.0000083
GO:0006914 autophagy The cellular catabolic process in which cells digest parts of their own cytoplasm; allows for both recycling of macromolecular constituents under conditions of cellular stress and remodeling the intracellular structure for cell differentiation. 0.0000150
GO:0006367 transcription initiation from RNA polymerase II promoter Any process involved in the assembly of the RNA polymerase II preinitiation complex (PIC) at an RNA polymerase II promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. Promoter clearance, or release, is the transition between the initiation and elongation phases of transcription. 0.0000178
GO:0006352 DNA-templated transcription, initiation Any process involved in the assembly of the RNA polymerase preinitiation complex (PIC) at the core promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. The initiation phase ends just before and does not include promoter clearance, or release, which is the transition between the initiation and elongation phases of transcription. 0.0000402
GO:0030163 protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein by the destruction of the native, active configuration, with or without the hydrolysis of peptide bonds. 0.0000588
GO:0006383 transcription by RNA polymerase III The synthesis of RNA from a DNA template by RNA polymerase III, originating at an RNAP III promoter. 0.0001480
GO:0006413 translational initiation The process preceding formation of the peptide bond between the first two amino acids of a protein. This includes the formation of a complex of the ribosome, mRNA or circRNA, and an initiation complex that contains the first aminoacyl-tRNA. 0.0002606
GO:0043628 ncRNA 3’-end processing Any process involved in forming the mature 3’ end of a non-coding RNA molecule. 0.0003081
GO:0000184 nuclear-transcribed mRNA catabolic process, nonsense-mediated decay The nonsense-mediated decay pathway for nuclear-transcribed mRNAs degrades mRNAs in which an amino-acid codon has changed to a nonsense codon; this prevents the translation of such mRNAs into truncated, and potentially harmful, proteins. 0.0004563
GO:0000245 spliceosomal complex assembly The aggregation, arrangement and bonding together of a spliceosomal complex, a ribonucleoprotein apparatus that catalyzes nuclear mRNA splicing via transesterification reactions. 0.0005274
GO:0016570 histone modification The covalent alteration of one or more amino acid residues within a histone protein. 0.0007554
GO:0006281 DNA repair The process of restoring DNA after damage. Genomes are subject to damage by chemical and physical agents in the environment (e.g. UV and ionizing radiations, chemical mutagens, fungal and bacterial toxins, etc.) and by free radicals or alkylating agents endogenously generated in metabolism. DNA is also damaged because of errors during its replication. A variety of different DNA repair pathways have been reported that include direct reversal, base excision repair, nucleotide excision repair, photoreactivation, bypass, double-strand break repair pathway, and mismatch repair pathway. 0.0009435
GO:0006368 transcription elongation from RNA polymerase II promoter The extension of an RNA molecule after transcription initiation and promoter clearance at an RNA polymerase II promoter by the addition of ribonucleotides catalyzed by RNA polymerase II. 0.0014137
GO:0002098 tRNA wobble uridine modification The process in which a uridine in position 34 of a tRNA is post-transcriptionally modified. 0.0015167
GO:0006744 ubiquinone biosynthetic process The chemical reactions and pathways resulting in the formation of ubiquinone, a lipid-soluble electron-transporting coenzyme. 0.0015472
GO:0016573 histone acetylation The modification of a histone by the addition of an acetyl group. 0.0015664
GO:0016236 macroautophagy The major inducible pathway for the general turnover of cytoplasmic constituents in eukaryotic cells, it is also responsible for the degradation of active cytoplasmic enzymes and organelles during nutrient starvation. Macroautophagy involves the formation of double-membrane-bounded autophagosomes which enclose the cytoplasmic constituent targeted for degradation in a membrane-bounded structure. Autophagosomes then fuse with a lysosome (or vacuole) releasing single-membrane-bounded autophagic bodies that are then degraded within the lysosome (or vacuole). Some types of macroautophagy, e.g. pexophagy, mitophagy, involve selective targeting of the targets to be degraded. 0.0016976
GO:0043631 RNA polyadenylation The enzymatic addition of a sequence of adenylyl residues at the 3’ end of an RNA molecule. 0.0018202
GO:0035304 regulation of protein dephosphorylation Any process that modulates the frequency, rate or extent of removal of phosphate groups from a protein. 0.0021291
GO:0045944 positive regulation of transcription by RNA polymerase II Any process that activates or increases the frequency, rate or extent of transcription from an RNA polymerase II promoter. 0.0022302
GO:0008033 tRNA processing The process in which a pre-tRNA molecule is converted to a mature tRNA, ready for addition of an aminoacyl group. 0.0031681
GO:0001510 RNA methylation Posttranscriptional addition of a methyl group to either a nucleotide or 2’-O ribose in a polyribonucleotide. Usually uses S-adenosylmethionine as a cofactor. 0.0035013
GO:0071985 multivesicular body sorting pathway A vesicle-mediated transport process in which transmembrane proteins are ubiquitylated to facilitate their entry into luminal vesicles of multivesicular bodies (MVBs); upon subsequent fusion of MVBs with lysosomes or vacuoles, the cargo proteins are degraded. 0.0041427
GO:0006888 ER to Golgi vesicle-mediated transport The directed movement of substances from the endoplasmic reticulum (ER) to the Golgi, mediated by COP II vesicles. Small COP II coated vesicles form from the ER and then fuse directly with the cis-Golgi. Larger structures are transported along microtubules to the cis-Golgi. 0.0044884
GO:0043094 cellular metabolic compound salvage Any process which produces a useful metabolic compound from derivatives of it without de novo synthesis, as carried out by individual cells. 0.0047084
GO:0048015 phosphatidylinositol-mediated signaling A series of molecular signals in which a cell uses a phosphatidylinositol-mediated signaling to convert a signal into a response. Phosphatidylinositols include phosphatidylinositol (PtdIns) and its phosphorylated derivatives. 0.0047888
GO:0009894 regulation of catabolic process Any process that modulates the frequency, rate, or extent of the chemical reactions and pathways resulting in the breakdown of substances. 0.0048416
GO:0016226 iron-sulfur cluster assembly The incorporation of iron and exogenous sulfur into a metallo-sulfur cluster. 0.0048899

I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.

showSigOfNodes(GOminus.BP, score(BP.ran.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 279 
## Number of Edges = 535 
## 
## $complete.dag
## [1] "A graph with 279 nodes."

This is just a example of the most significant GO term:

showGroupDensity(GOminus.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOminus.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOminus.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MF.ran.weight01)))
significant.weight01 <- score(MF.ran.weight01)[orderedTerms] <= 0.005
significant.lea      <- score(MF.ran.lea)[orderedTerms] <= 0.005
significant.elim     <- score(MF.ran.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.ran.sigTerms <- sigTerms

MF.all <- GenTable(GOminus.MF, elim=MF.ran.elim, weight01=MF.ran.weight01, lea=MF.ran.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   MF.all[MF.all$GO.ID %in% sigTerms,],
   caption = "Most significant terms of the Molecular Function ontology.")
Most significant terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0003735 structural constituent of ribosome 113 0 0.57 1 5.2e-23 5.2e-23 5.2e-23
2 GO:0003723 RNA binding 250 1 1.26 2 3.1e-12 4.2e-12 6.8e-18
3 GO:0036459 thiol-dependent ubiquitinyl hydrolase ac… 43 3 0.22 3 6.5e-08 6.5e-08 1.1e-10
4 GO:0003676 nucleic acid binding 1217 4 6.14 7 1.9e-06 7.2e-08 9.7e-07
5 GO:0061630 ubiquitin protein ligase activity 27 1 0.14 4 2.1e-07 2.1e-07 2.1e-07
6 GO:0003712 transcription coregulator activity 42 0 0.21 6 7.5e-07 2.4e-07 7.5e-07
7 GO:0004842 ubiquitin-protein transferase activity 76 2 0.38 5 2.6e-07 2.6e-07 3.9e-11
8 GO:0003899 DNA-directed 5’-3’ RNA polymerase activi… 27 0 0.14 8 2.6e-06 2.6e-06 2.6e-06
9 GO:0004298 threonine-type endopeptidase activity 15 0 0.08 9 1.2e-05 1.2e-05 1.2e-05
12 GO:0003743 translation initiation factor activity 28 0 0.14 10 5.4e-05 5.4e-05 5.4e-05
13 GO:0046982 protein heterodimerization activity 66 0 0.33 11 7.3e-05 7.3e-05 7.3e-05
14 GO:0031625 ubiquitin protein ligase binding 10 0 0.05 12 0.00038 0.00038 0.00038
15 GO:0004843 thiol-dependent ubiquitin-specific prote… 12 1 0.06 13 0.00038 0.00038 0.00038
16 GO:0004674 protein serine/threonine kinase activity 76 0 0.38 20 0.00184 0.00069 0.00184
17 GO:0003729 mRNA binding 30 1 0.15 16 0.00097 0.00097 0.00097
18 GO:0008168 methyltransferase activity 97 0 0.49 15 0.00081 0.00106 1.1e-06
19 GO:0000166 nucleotide binding 1117 5 5.64 17 0.00099 0.00118 1.3e-05
20 GO:0004402 histone acetyltransferase activity 9 0 0.05 18 0.00123 0.00123 0.00123
22 GO:0008137 NADH dehydrogenase (ubiquinone) activity 12 0 0.06 22 0.00204 0.00204 0.00204
23 GO:0051536 iron-sulfur cluster binding 35 0 0.18 21 0.00190 0.00211 0.00190
27 GO:0005524 ATP binding 786 2 3.97 24 0.00280 0.00280 0.00280
28 GO:0008536 Ran GTPase binding 14 0 0.07 25 0.00322 0.00322 0.00322
29 GO:0003684 damaged DNA binding 10 0 0.05 26 0.00330 0.00330 0.00330
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MF.ran.weight01)[sigTerms]),
      caption = 'MF terms overrepresented in random environments')
MF terms overrepresented in random environments
Term Definition PValue
GO:0003735 structural constituent of ribosome The action of a molecule that contributes to the structural integrity of the ribosome. 0.0000000
GO:0003723 RNA binding Interacting selectively and non-covalently with an RNA molecule or a portion thereof. 0.0000000
GO:0036459 thiol-dependent ubiquitinyl hydrolase activity Catalysis of the thiol-dependent hydrolysis of an ester, thioester, amide, peptide or isopeptide bond formed by the C-terminal glycine of ubiquitin. 0.0000001
GO:0003676 nucleic acid binding Interacting selectively and non-covalently with any nucleic acid. 0.0000001
GO:0061630 ubiquitin protein ligase activity Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. 0.0000002
GO:0003712 transcription coregulator activity A protein or a member of a complex that interacts specifically and non-covalently with a DNA-bound DNA-binding transcription factor to either activate or repress the transcription of specific genes. Coregulators often act by altering chromatin structure and modifications. For example, one class of transcription coregulators modifies chromatin structure through covalent modification of histones. A second ATP-dependent class modifies the conformation of chromatin. A third class modulates interactions of DNA-binding transcription factor with other transcription coregulators. 0.0000002
GO:0004842 ubiquitin-protein transferase activity Catalysis of the transfer of ubiquitin from one protein to another via the reaction X-Ub + Y –> Y-Ub + X, where both X-Ub and Y-Ub are covalent linkages. 0.0000003
GO:0003899 DNA-directed 5’-3’ RNA polymerase activity Catalysis of the reaction: nucleoside triphosphate + RNA(n) = diphosphate + RNA(n+1). Utilizes a DNA template, i.e. the catalysis of DNA-template-directed extension of the 3’-end of an RNA strand by one nucleotide at a time. Can initiate a chain ‘de novo’. 0.0000026
GO:0004298 threonine-type endopeptidase activity Catalysis of the hydrolysis of internal peptide bonds in a polypeptide chain by a mechanism in which the hydroxyl group of a threonine residue at the active center acts as a nucleophile. 0.0000119
GO:0003743 translation initiation factor activity Functions in the initiation of ribosome-mediated translation of mRNA into a polypeptide. 0.0000541
GO:0046982 protein heterodimerization activity Interacting selectively and non-covalently with a nonidentical protein to form a heterodimer. 0.0000729
GO:0031625 ubiquitin protein ligase binding Interacting selectively and non-covalently with a ubiquitin protein ligase enzyme, any of the E3 proteins. 0.0003755
GO:0004843 thiol-dependent ubiquitin-specific protease activity Catalysis of the thiol-dependent hydrolysis of a peptide bond formed by the C-terminal glycine of ubiquitin and another protein. 0.0003848
GO:0004674 protein serine/threonine kinase activity Catalysis of the reactions: ATP + protein serine = ADP + protein serine phosphate, and ATP + protein threonine = ADP + protein threonine phosphate. 0.0006932
GO:0003729 mRNA binding Interacting selectively and non-covalently with messenger RNA (mRNA), an intermediate molecule between DNA and protein. mRNA includes UTR and coding sequences, but does not contain introns. 0.0009711
GO:0008168 methyltransferase activity Catalysis of the transfer of a methyl group to an acceptor molecule. 0.0010650
GO:0000166 nucleotide binding Interacting selectively and non-covalently with a nucleotide, any compound consisting of a nucleoside that is esterified with (ortho)phosphate or an oligophosphate at any hydroxyl group on the ribose or deoxyribose. 0.0011756
GO:0004402 histone acetyltransferase activity Catalysis of the reaction: acetyl-CoA + histone = CoA + acetyl-histone. 0.0012320
GO:0008137 NADH dehydrogenase (ubiquinone) activity Catalysis of the reaction: NADH + H+ + ubiquinone = NAD+ + ubiquinol. 0.0020413
GO:0051536 iron-sulfur cluster binding Interacting selectively and non-covalently with an iron-sulfur cluster, a combination of iron and sulfur atoms. 0.0021125
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0027954
GO:0008536 Ran GTPase binding Interacting selectively and non-covalently with Ran, a conserved Ras-like GTP-binding protein, implicated in nucleocytoplasmic transport, cell cycle progression, spindle assembly, nuclear organization and nuclear envelope (NE) assembly. 0.0032245
GO:0003684 damaged DNA binding Interacting selectively and non-covalently with damaged DNA. 0.0032998
GO:0003887 DNA-directed DNA polymerase activity Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template and a 3’hydroxyl group. 0.0048384
GO:0060590 ATPase regulator activity Modulates the rate of ATP hydrolysis by an ATPase. 0.0048503
GO:0043022 ribosome binding Interacting selectively and non-covalently with any part of a ribosome. 0.0048964
showSigOfNodes(GOminus.MF, score(MF.ran.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 105 
## Number of Edges = 126 
## 
## $complete.dag
## [1] "A graph with 105 nodes."
showGroupDensity(GOminus.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOminus.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOminus.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CC.ran.weight01)))
significant.weight01 <- score(CC.ran.weight01)[orderedTerms] <= 0.005
significant.lea      <- score(CC.ran.lea)[orderedTerms] <= 0.005
significant.elim     <- score(CC.ran.elim)[orderedTerms] <= 0.005
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.ran.sigTerms <- sigTerms

CC.all <- GenTable(GOminus.CC, elim=CC.ran.elim, weight01=CC.ran.weight01, lea=CC.ran.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   CC.all[CC.all$GO.ID %in% sigTerms,],
   caption = "Most overrepresented Cellular Component terms in random environments.")
Most overrepresented Cellular Component terms in random environments.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0005840 ribosome 115 0 0.59 1 1.6e-19 1.6e-18 6.4e-22
2 GO:0005634 nucleus 531 2 2.71 2 6.7e-11 6.7e-11 < 1e-30
3 GO:0016592 mediator complex 24 0 0.12 3 1.3e-06 1.3e-06 1.3e-06
5 GO:0005681 spliceosomal complex 12 0 0.06 4 1.1e-05 1.1e-05 1.1e-05
6 GO:0090575 RNA polymerase II transcription factor c… 23 0 0.12 5 1.9e-05 1.9e-05 1.8e-07
7 GO:0016591 RNA polymerase II, holoenzyme 20 0 0.10 6 2.2e-05 2.2e-05 2.2e-07
8 GO:0008023 transcription elongation factor complex 10 0 0.05 10 0.00023 0.00023 0.00023
9 GO:0044451 nucleoplasm part 96 0 0.49 21 0.00198 0.00033 3.0e-19
10 GO:0044444 cytoplasmic part 511 3 2.61 7 2.8e-05 0.00067 5.3e-24
11 GO:0070603 SWI/SNF superfamily-type complex 19 0 0.10 12 0.00043 0.00098 0.00043
12 GO:0015934 large ribosomal subunit 14 0 0.07 15 0.00105 0.00105 0.00105
13 GO:0005669 transcription factor TFIID complex 7 0 0.04 16 0.00107 0.00107 0.00107
14 GO:0044455 mitochondrial membrane part 26 0 0.13 18 0.00123 0.00150 0.00123
15 GO:0008180 COP9 signalosome 7 0 0.04 23 0.00221 0.00221 0.00221
16 GO:0010008 endosome membrane 9 0 0.05 24 0.00234 0.00234 0.00234
17 GO:0032991 protein-containing complex 886 1 4.52 30 0.00400 0.00305 < 1e-30
18 GO:0019773 proteasome core complex, alpha-subunit c… 7 0 0.04 31 0.00434 0.00434 0.00434
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CC.ran.weight01)[sigTerms]),
      caption = 'CC terms overrepresented in random environments.')
CC terms overrepresented in random environments.
Term Definition PValue
GO:0005840 ribosome An intracellular organelle, about 200 A in diameter, consisting of RNA and protein. It is the site of protein biosynthesis resulting from translation of messenger RNA (mRNA). It consists of two subunits, one large and one small, each containing only protein and RNA. Both the ribosome and its subunits are characterized by their sedimentation coefficients, expressed in Svedberg units (symbol: S). Hence, the prokaryotic ribosome (70S) comprises a large (50S) subunit and a small (30S) subunit, while the eukaryotic ribosome (80S) comprises a large (60S) subunit and a small (40S) subunit. Two sites on the ribosomal large subunit are involved in translation, namely the aminoacyl site (A site) and peptidyl site (P site). Ribosomes from prokaryotes, eukaryotes, mitochondria, and chloroplasts have characteristically distinct ribosomal proteins. 0.0000000
GO:0005634 nucleus A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. 0.0000000
GO:0016592 mediator complex A protein complex that interacts with the carboxy-terminal domain of the largest subunit of RNA polymerase II and plays an active role in transducing the signal from a transcription factor to the transcriptional machinery. The mediator complex is required for activation of transcription of most protein-coding genes, but can also act as a transcriptional corepressor. The Saccharomyces complex contains several identifiable subcomplexes: a head domain comprising Srb2, -4, and -5, Med6, -8, and -11, and Rox3 proteins; a middle domain comprising Med1, -4, and -7, Nut1 and -2, Cse2, Rgr1, Soh1, and Srb7 proteins; a tail consisting of Gal11p, Med2p, Pgd1p, and Sin4p; and a regulatory subcomplex comprising Ssn2, -3, and -8, and Srb8 proteins. Metazoan mediator complexes have similar modular structures and include homologs of yeast Srb and Med proteins. 0.0000013
GO:0005681 spliceosomal complex Any of a series of ribonucleoprotein complexes that contain snRNA(s) and small nuclear ribonucleoproteins (snRNPs), and are formed sequentially during the spliceosomal splicing of one or more substrate RNAs, and which also contain the RNA substrate(s) from the initial target RNAs of splicing, the splicing intermediate RNA(s), to the final RNA products. During cis-splicing, the initial target RNA is a single, contiguous RNA transcript, whether mRNA, snoRNA, etc., and the released products are a spliced RNA and an excised intron, generally as a lariat structure. During trans-splicing, there are two initial substrate RNAs, the spliced leader RNA and a pre-mRNA. 0.0000108
GO:0090575 RNA polymerase II transcription factor complex A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. 0.0000190
GO:0016591 RNA polymerase II, holoenzyme A nuclear DNA-directed RNA polymerase complex containing an RNA polymerase II core enzyme as well as additional proteins and transcription factor complexes, that are capable of promoter recognition and transcription initiation from an RNA polymerase II promoter in vivo. These additional components may include general transcription factor complexes TFIIA, TFIID, TFIIE, TFIIF, or TFIIH, as well as Mediator, SWI/SNF, GCN5, or SRBs and confer the ability to recognize promoters. 0.0000217
GO:0008023 transcription elongation factor complex Any protein complex that interacts with RNA polymerase II to increase (positive transcription elongation factor) or reduce (negative transcription elongation factor) the rate of transcription elongation. 0.0002321
GO:0044451 nucleoplasm part Any constituent part of the nucleoplasm, that part of the nuclear content other than the chromosomes or the nucleolus. 0.0003254
GO:0044444 cytoplasmic part Any constituent part of the cytoplasm, all of the contents of a cell excluding the plasma membrane and nucleus, but including other subcellular structures. 0.0006735
GO:0070603 SWI/SNF superfamily-type complex A protein complex that contains an ortholog of the Saccharomyces ATPase Swi2/Snf2 as one of the catalytic subunit components (ATPase) and mediates assembly of nucleosomes, changes to the spacing or structure of nucleosomes, or some combination of those activities in a manner that requires ATP. 0.0009792
GO:0015934 large ribosomal subunit The larger of the two subunits of a ribosome. Two sites on the ribosomal large subunit are involved in translation, namely the aminoacyl site (A site) and peptidyl site (P site). 0.0010546
GO:0005669 transcription factor TFIID complex A complex composed of TATA binding protein (TBP) and TBP associated factors (TAFs); the total mass is typically about 800 kDa. Most of the TAFs are conserved across species. In TATA-containing promoters for RNA polymerase II (Pol II), TFIID is believed to recognize at least two distinct elements, the TATA element and a downstream promoter element. TFIID is also involved in recognition of TATA-less Pol II promoters. Binding of TFIID to DNA is necessary but not sufficient for transcription initiation from most RNA polymerase II promoters. 0.0010651
GO:0044455 mitochondrial membrane part Any constituent part of a mitochondrial membrane, either of the lipid bilayers that surround the mitochondrion and form the mitochondrial envelope. 0.0015008
GO:0008180 COP9 signalosome A protein complex that catalyzes the deneddylation of proteins, including the cullin component of SCF ubiquitin E3 ligase; deneddylation increases the activity of cullin family ubiquitin ligases. The signalosome is involved in many regulatory process, including some which control development, in many species; also regulates photomorphogenesis in plants; in many species its subunits are highly similar to those of the proteasome. 0.0022149
GO:0010008 endosome membrane The lipid bilayer surrounding an endosome. 0.0023386
GO:0032991 protein-containing complex A stable assembly of two or more macromolecules, i.e. proteins, nucleic acids, carbohydrates or lipids, in which at least one component is a protein and the constituent parts function together. 0.0030549
GO:0019773 proteasome core complex, alpha-subunit complex The proteasome core subcomplex that constitutes the two outer rings of the proteasome core complex. An example of this component is found in Mus musculus. 0.0043397
showSigOfNodes(GOminus.CC, score(CC.ran.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 100 
## Number of Edges = 191 
## 
## $complete.dag
## [1] "A graph with 100 nodes."
showGroupDensity(GOminus.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOminus.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOminus.CC, orderedTerms[3], rm.one=FALSE)

Comparison between the two ordering of genes

I want to compare these results with those from 2020-01-14, where I used the \(p\) value of the differential expression analysis to order the genes. I can import the data from 2020-01-14, but I need to do it in a new environment to prevent overwriting the current results.

load(ENRICHMENT, ex <- new.env())

Biological process

allTerms <- usedGO(GOdata.BP)
BP.pvalue.sigTerms <- with(ex, BP.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat  = allTerms %in% c(BP.reg.sigTerms, BP.ran.sigTerms),
                             PValue = allTerms %in% BP.pvalue.sigTerms)))

# New terms:
as.data.frame(Term(GOTERM[setdiff(c(BP.reg.sigTerms, BP.ran.sigTerms), BP.pvalue.sigTerms)]))
##            Term(GOTERM[setdiff(c(BP.reg.sigTerms, BP.ran.sigTerms), BP.pvalue.sigTerms)])
## GO:0006814                                                           sodium ion transport
## GO:0006030                                                       chitin metabolic process
## GO:0006486                                                          protein glycosylation
## GO:0005975                                                 carbohydrate metabolic process
## GO:0015074                                                                DNA integration
## GO:0006811                                                                  ion transport
## GO:0007156                homophilic cell adhesion via plasma membrane adhesion molecules
## GO:0070588                                            calcium ion transmembrane transport
## GO:0007154                                                             cell communication
## GO:0007018                                                     microtubule-based movement
## GO:0006412                                                                    translation
## GO:0006511                                  ubiquitin-dependent protein catabolic process
## GO:0016579                                                       protein deubiquitination
## GO:0000398                                                 mRNA splicing, via spliceosome
## GO:0006397                                                                mRNA processing
## GO:0006351                                                   transcription, DNA-templated
## GO:0016567                                                         protein ubiquitination
## GO:0006396                                                                 RNA processing
## GO:0006886                                                intracellular protein transport
## GO:0006914                                                                      autophagy
## GO:0006367                       transcription initiation from RNA polymerase II promoter
## GO:0006352                                        DNA-templated transcription, initiation
## GO:0030163                                                      protein catabolic process
## GO:0006383                                            transcription by RNA polymerase III
## GO:0006413                                                       translational initiation
## GO:0043628                                                        ncRNA 3'-end processing
## GO:0000184            nuclear-transcribed mRNA catabolic process, nonsense-mediated decay
## GO:0000245                                                  spliceosomal complex assembly
## GO:0016570                                                           histone modification
## GO:0006281                                                                     DNA repair
## GO:0006368                       transcription elongation from RNA polymerase II promoter
## GO:0002098                                               tRNA wobble uridine modification
## GO:0006744                                                ubiquinone biosynthetic process
## GO:0016573                                                            histone acetylation
## GO:0016236                                                                 macroautophagy
## GO:0043631                                                            RNA polyadenylation
## GO:0035304                                        regulation of protein dephosphorylation
## GO:0045944                      positive regulation of transcription by RNA polymerase II
## GO:0008033                                                                tRNA processing
## GO:0001510                                                                RNA methylation
## GO:0071985                                            multivesicular body sorting pathway
## GO:0006888                                         ER to Golgi vesicle-mediated transport
## GO:0043094                                            cellular metabolic compound salvage
## GO:0048015                                        phosphatidylinositol-mediated signaling
## GO:0009894                                                regulation of catabolic process
## GO:0016226                                                   iron-sulfur cluster assembly
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(BP.pvalue.sigTerms, c(BP.reg.sigTerms, BP.ran.sigTerms))]))
##            Term(GOTERM[setdiff(BP.pvalue.sigTerms, c(BP.reg.sigTerms, BP.ran.sigTerms))])
## GO:0006468                                                        protein phosphorylation
## GO:0055114                                                    oxidation-reduction process
## GO:0007165                                                            signal transduction
## GO:0006355                                     regulation of transcription, DNA-templated
## GO:0006470                                                      protein dephosphorylation
## GO:0006979                                                   response to oxidative stress
## GO:1901642                                             nucleoside transmembrane transport
## GO:0043161              proteasome-mediated ubiquitin-dependent protein catabolic process
## GO:0046942                                                      carboxylic acid transport

Molecular function

allTerms <- usedGO(GOdata.MF)
MF.pvalue.sigTerms <- with(ex, MF.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat  = allTerms %in% c(MF.reg.sigTerms, MF.ran.sigTerms),
                             PValue = allTerms %in% MF.pvalue.sigTerms)))

# New terms:
as.data.frame(Term(GOTERM[setdiff(c(MF.reg.sigTerms, MF.ran.sigTerms), MF.pvalue.sigTerms)]))
##            Term(GOTERM[setdiff(c(MF.reg.sigTerms, MF.ran.sigTerms), MF.pvalue.sigTerms)])
## GO:0043565                                                  sequence-specific DNA binding
## GO:0008061                                                                 chitin binding
## GO:0005272                                                        sodium channel activity
## GO:0004888                                      transmembrane signaling receptor activity
## GO:0008417                                                    fucosyltransferase activity
## GO:0005230                                extracellular ligand-gated ion channel activity
## GO:0005328                                     neurotransmitter:sodium symporter activity
## GO:0051015                                                         actin filament binding
## GO:0003777                                                     microtubule motor activity
## GO:0020037                                                                   heme binding
## GO:0004096                                                              catalase activity
## GO:0031409                                                                pigment binding
## GO:0004983                                               neuropeptide Y receptor activity
## GO:0015293                                                             symporter activity
## GO:0008378                                                 galactosyltransferase activity
## GO:0004601                                                            peroxidase activity
## GO:0004190                                           aspartic-type endopeptidase activity
## GO:0008528                                    G protein-coupled peptide receptor activity
## GO:0003735                                             structural constituent of ribosome
## GO:0003723                                                                    RNA binding
## GO:0036459                                 thiol-dependent ubiquitinyl hydrolase activity
## GO:0003676                                                           nucleic acid binding
## GO:0003712                                             transcription coregulator activity
## GO:0004842                                         ubiquitin-protein transferase activity
## GO:0003899                                     DNA-directed 5'-3' RNA polymerase activity
## GO:0004298                                          threonine-type endopeptidase activity
## GO:0003743                                         translation initiation factor activity
## GO:0046982                                            protein heterodimerization activity
## GO:0031625                                               ubiquitin protein ligase binding
## GO:0004843                           thiol-dependent ubiquitin-specific protease activity
## GO:0004674                                       protein serine/threonine kinase activity
## GO:0003729                                                                   mRNA binding
## GO:0008168                                                     methyltransferase activity
## GO:0000166                                                             nucleotide binding
## GO:0004402                                             histone acetyltransferase activity
## GO:0008137                                       NADH dehydrogenase (ubiquinone) activity
## GO:0051536                                                    iron-sulfur cluster binding
## GO:0008536                                                             Ran GTPase binding
## GO:0003684                                                            damaged DNA binding
## GO:0003887                                           DNA-directed DNA polymerase activity
## GO:0060590                                                      ATPase regulator activity
## GO:0043022                                                               ribosome binding
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(MF.pvalue.sigTerms, c(MF.reg.sigTerms, MF.ran.sigTerms))]))
##            Term(GOTERM[setdiff(MF.pvalue.sigTerms, c(MF.reg.sigTerms, MF.ran.sigTerms))])
## GO:0005525                                                                    GTP binding
## GO:0004672                                                        protein kinase activity
## GO:0005507                                                             copper ion binding
## GO:0008138                         protein tyrosine/serine/threonine phosphatase activity
## GO:0004198                         calcium-dependent cysteine-type endopeptidase activity
## GO:0003924                                                                GTPase activity
## GO:0042626               ATPase activity, coupled to transmembrane movement of substances
## GO:0005337                                  nucleoside transmembrane transporter activity
## GO:0016787                                                             hydrolase activity
## GO:0008017                                                            microtubule binding
## GO:0016840                                                 carbon-nitrogen lyase activity
## GO:0047631                                              ADP-ribose diphosphatase activity
## GO:0005506                                                               iron ion binding

Cellular component

allTerms <- usedGO(GOdata.CC)
CC.pvalue.sigTerms <- with(ex, CC.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat  = allTerms %in% c(CC.reg.sigTerms, CC.ran.sigTerms),
                             PValue = allTerms %in% CC.pvalue.sigTerms)))

# New terms:
as.data.frame(Term(GOTERM[setdiff(c(CC.reg.sigTerms, CC.ran.sigTerms), CC.pvalue.sigTerms)]))
##            Term(GOTERM[setdiff(c(CC.reg.sigTerms, CC.ran.sigTerms), CC.pvalue.sigTerms)])
## GO:0098797                                                plasma membrane protein complex
## GO:0031012                                                           extracellular matrix
## GO:0034704                                                        calcium channel complex
## GO:0005874                                                                    microtubule
## GO:0005929                                                                         cilium
## GO:0005858                                                        axonemal dynein complex
## GO:0045211                                                          postsynaptic membrane
## GO:0005840                                                                       ribosome
## GO:0005634                                                                        nucleus
## GO:0016592                                                               mediator complex
## GO:0005681                                                           spliceosomal complex
## GO:0016591                                                  RNA polymerase II, holoenzyme
## GO:0008023                                        transcription elongation factor complex
## GO:0044451                                                               nucleoplasm part
## GO:0044444                                                               cytoplasmic part
## GO:0070603                                               SWI/SNF superfamily-type complex
## GO:0015934                                                        large ribosomal subunit
## GO:0005669                                             transcription factor TFIID complex
## GO:0044455                                                    mitochondrial membrane part
## GO:0008180                                                               COP9 signalosome
## GO:0010008                                                              endosome membrane
## GO:0032991                                                     protein-containing complex
## GO:0019773                                 proteasome core complex, alpha-subunit complex
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(CC.pvalue.sigTerms, c(CC.reg.sigTerms, CC.ran.sigTerms))]))
##            Term(GOTERM[setdiff(CC.pvalue.sigTerms, c(CC.reg.sigTerms, CC.ran.sigTerms))])
## GO:0005887                                          integral component of plasma membrane
## GO:0098800                                   inner mitochondrial membrane protein complex

Session info

save(allgenes2GO,
     GOdata.BP,  BP.reg.elim, BP.reg.weight01, BP.reg.lea, BP.reg.sigTerms,
     GOminus.BP, BP.ran.elim, BP.ran.weight01, BP.ran.lea, BP.ran.sigTerms,
     GOdata.MF,  MF.reg.elim, MF.reg.weight01, MF.reg.lea, MF.reg.sigTerms,
     GOminus.MF, MF.ran.elim, MF.ran.weight01, MF.ran.lea, MF.ran.sigTerms,
     GOdata.CC,  CC.reg.elim, CC.reg.weight01, CC.reg.lea, CC.reg.sigTerms,
     GOminus.CC, CC.ran.elim, CC.ran.weight01, CC.ran.lea, CC.ran.sigTerms,
     file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] Rgraphviz_2.30.0         variancePartition_1.16.1 scales_1.1.0            
##  [4] foreach_1.5.0            ggplot2_3.3.0            limma_3.42.2            
##  [7] knitr_1.28               topGO_2.38.1             SparseM_1.78            
## [10] GO.db_3.10.0             AnnotationDbi_1.48.0     IRanges_2.20.2          
## [13] S4Vectors_0.24.4         Biobase_2.46.0           graph_1.64.0            
## [16] BiocGenerics_0.32.0     
## 
## loaded via a namespace (and not attached):
##  [1] bit64_0.9-7         splines_3.6.3       gtools_3.8.2       
##  [4] assertthat_0.2.1    statmod_1.4.34      highr_0.8          
##  [7] blob_1.2.1          yaml_2.2.1          progress_1.2.2     
## [10] pillar_1.4.3        RSQLite_2.2.0       lattice_0.20-41    
## [13] glue_1.4.0          digest_0.6.25       minqa_1.2.4        
## [16] colorspace_1.4-1    htmltools_0.4.0     Matrix_1.2-18      
## [19] plyr_1.8.6          pkgconfig_2.0.3     purrr_0.3.4        
## [22] gdata_2.18.0        BiocParallel_1.20.1 lme4_1.1-23        
## [25] tibble_3.0.1        ellipsis_0.3.0      withr_2.2.0        
## [28] pbkrtest_0.4-8.6    magrittr_1.5        crayon_1.3.4       
## [31] memoise_1.1.0       evaluate_0.14       doParallel_1.0.15  
## [34] nlme_3.1-147        MASS_7.3-51.6       gplots_3.0.3       
## [37] tools_3.6.3         prettyunits_1.1.1   hms_0.5.3          
## [40] lifecycle_0.2.0     matrixStats_0.56.0  stringr_1.4.0      
## [43] munsell_0.5.0       colorRamps_2.3      compiler_3.6.3     
## [46] caTools_1.18.0      rlang_0.4.6         nloptr_1.2.2.1     
## [49] iterators_1.0.12    bitops_1.0-6        rmarkdown_2.1      
## [52] boot_1.3-25         gtable_0.3.0        codetools_0.2-16   
## [55] DBI_1.1.0           reshape2_1.4.4      R6_2.4.1           
## [58] dplyr_0.8.5         bit_1.1-15.2        KernSmooth_2.23-17 
## [61] stringi_1.4.6       Rcpp_1.0.4.6        vctrs_0.2.4        
## [64] tidyselect_1.0.0    xfun_0.13

Alexa, Adrian, and Jorg Rahnenfuhrer. 2019. TopGO: Enrichment Analysis for Gene Ontology.